ABSTRACT
This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to f (out of n) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called 2f-redundancy. However, 2f-redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of (f,ε)-resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with ε accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to 2f-redundancy. We obtain necessary and sufficient conditions for achieving (f, ε)-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for (f, ε)-resilience of the distributed gradient-descent method when equipped with robust gradient aggregation; such as comparative gradient elimination or coordinate-wise trimmed mean.
Supplemental Material
- Dan Alistarh, Zeyuan Allen-Zhu, and Jerry Li. 2018. Byzantine Stochastic Gradient Descent. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/ a07c2f3b3b907aaf8436a26c6d77f0a2-Paper.pdfGoogle Scholar
- Takeshi Amemiya. 1985. Advanced econometrics. Harvard university press.Google Scholar
- Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, and Anima Anandkumar. 2019. signSGD with Majority Vote is Communication Efficient And Fault Tolerant. arXiv:cs.DC/1810.05291Google Scholar
- Dimitri P Bertsekas and John N Tsitsiklis. 1989. Parallel and distributed computation: numerical methods. Vol. 23. Prentice hall Englewood Cliffs, NJ.Google Scholar
- Kush Bhatia, Prateek Jain, and Purushottam Kar. 2015. Robust Regression via Hard Thresholding. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'15). MIT Press, Cambridge, MA, USA, 721--729.Google Scholar
- Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 118--128.Google Scholar
- Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review 60, 2 (2018), 223--311.Google ScholarCross Ref
- Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning 3, 1 (Jan. 2011), 1--122.Google ScholarDigital Library
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.Google ScholarDigital Library
- Xinyang Cao and Lifeng Lai. 2019. Distributed gradient descent algorithm robust to an arbitrary number of byzantine attackers. IEEE Transactions on Signal Processing 67, 22 (2019), 5850--5864.Google ScholarCross Ref
- Moses Charikar, Jacob Steinhardt, and Gregory Valiant. 2017. Learning from Untrusted Data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2017). Association for Computing Machinery, New York, NY, USA, 47--60. https://doi.org/10.1145/3055399.3055491Google ScholarDigital Library
- Yuan Chen, Soummya Kar, and José M. F. Moura. 2018. Resilient Distributed Estimation Through Adversary Detection. IEEE Transactions on Signal Processing 66, 9 (2018), 2455--2469.Google ScholarCross Ref
- Yudong Chen, Lili Su, and Jiaming Xu. 2017. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1, 2 (2017), 44.Google ScholarDigital Library
- Michelle S Chong, Masashi Wakaiki, and Joao P Hespanha. 2015. Observability of linear systems under adversarial attacks. In American Control Conference. IEEE, 2439--2444.Google ScholarCross Ref
- Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek Patra, and Mahsa Taziki. 2018. Asynchronous Byzantine Machine Learning (the case of SGD). In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), JenniferDy and Andreas Krause (Eds.), Vol. 80. PMLR, 1145--1154. http://proceedings.mlr.press/v80/damaskinos18a.htmlGoogle Scholar
- Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. 2019. Sever: A Robust Meta-Algorithm for Stochastic Optimization. arXiv:cs.LG/1803.02815Google Scholar
- John C Duchi, Alekh Agarwal, and Martin J Wainwright. 2011. Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic control 57, 3 (2011), 592--606.Google ScholarCross Ref
- El Mahdi El Mhamdi, Rachid Guerraoui, and Sébastien Rouault. 2018. The Hidden Vulnerability of Distributed Learning in Byzantium. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 3521--3530. http://proceedings.mlr.press/v80/mhamdi18a.htmlGoogle Scholar
- Hamza Fawzi, Paulo Tabuada, and Suhas Diggavi. 2014. Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Transactions on Automatic control 59, 6 (2014), 1454--1467.Google ScholarCross Ref
- Jiashi Feng, Huan Xu, and Shie Mannor. 2015. Distributed Robust Learning. arXiv:stat.ML/1409.5937Google Scholar
- Nirupam Gupta, Shuo Liu, and Nitin H. Vaidya. 2021. Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE). arXiv:cs.LG/2008.04699Google Scholar
- Nirupam Gupta and Nitin H. Vaidya. 2019. Byzantine Fault Tolerant Distributed Linear Regression. arXiv:cs.LG/1903.08752Google Scholar
- Nirupam Gupta and Nitin H. Vaidya. 2019. Byzantine Fault-Tolerant Parallelized Stochastic Gradient Descent for Linear Regression. In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 415--420. https://doi.org/10.1109/ALLERTON.2019.8919735Google Scholar
- Nirupam Gupta and Nitin H. Vaidya. 2020. Fault-Tolerance in Distributed Optimization: The Case of Redundancy. In Proceedings of the 39th Symposium on Principles of Distributed Computing (PODC '20). Association for Computing Machinery, New York, NY, USA, 365--374. https://doi.org/10.1145/3382734.3405748Google Scholar
- Nirupam Gupta and Nitin H. Vaidya. 2020. Resilience in Collaborative Optimization: Redundant and Independent Cost Functions. arXiv:cs.DC/2003.09675Google Scholar
- Sai Praneeth Karimireddy, Lie He, and Martin Jaggi. 2020. Learning from History for Byzantine Robust Optimization. arXiv:cs.LG/2012.10333Google Scholar
- Kananart Kuwaranancharoen, Lei Xin, and Shreyas Sundaram. 2020. Byzantineresilient distributed optimization of multi-dimensional functions. In 2020 American Control Conference (ACC). IEEE, 4399--4404.Google ScholarCross Ref
- Leslie Lamport, Robert Shostak, and Marshall Pease. 2019. The Byzantine Generals Problem. Association for Computing Machinery, New York, NY, USA, 203--226.Google Scholar
- Shuo Liu, Nirupam Gupta, and Nitin H. Vaidya. 2021. Approximate Byzantine Fault-Tolerance in Distributed Optimization. arXiv:cs.DC/2101.09337Google Scholar
- Nancy A Lynch. 1996. Distributed algorithms. Elsevier.Google ScholarDigital Library
- Shaunak Mishra, Yasser Shoukry, Nikhil Karamchandani, Suhas N Diggavi, and Paulo Tabuada. 2016. Secure state estimation against sensor attacks in the presence of noise. IEEE Transactions on Control of Network Systems 4, 1 (2016), 49--59.Google ScholarCross Ref
- James R Munkres. 2000. Topology. Prentice Hall Upper Saddle River, NJ.Google Scholar
- Angelia Nedic and Asuman Ozdaglar. 2009. Distributed Subgradient Methods for Multi-Agent Optimization. IEEE Trans. Automat. Control 54, 1 (2009), 48--61.Google ScholarCross Ref
- Miroslav Pajic, Insup Lee, and George J Pappas. 2017. Attack-resilient state estimation for noisy dynamical systems. IEEE Transactions on Control of Network Systems 4, 1 (2017), 82--92.Google ScholarCross Ref
- Miroslav Pajic, JamesWeimer, Nicola Bezzo, Paulo Tabuada, Oleg Sokolsky, Insup Lee, and George J Pappas. 2014. Robustness of attack-resilient state estimators. In ICCPS'14: ACM/IEEE 5th International Conference on Cyber-Physical Systems (with CPS Week 2014). IEEE, 163--174.Google ScholarDigital Library
- Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, and Pradeep Ravikumar. 2018. Robust Estimation via Robust Gradient Estimation. arXiv:stat.ML/1802.06485Google Scholar
- Michael Rabbat and Robert Nowak. 2004. Distributed optimization in sensor networks. In Proceedings of the 3rd international symposium on Information processing in sensor networks. IEEE, 20--27.Google ScholarDigital Library
- Robin L Raffard, Claire J Tomlin, and Stephen P Boyd. 2004. Distributed optimization for cooperative agents: Application to formation flight. In 2004 43rd IEEE Conference on Decision and Control (CDC), Vol. 3. IEEE, 2453--2459.Google ScholarCross Ref
- Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L Sangiovanni- Vincentelli, Sanjit A Seshia, Mani Srivastava, and Paulo Tabuada. 2015. Imhotep- SMT: A satisfiability modulo theory solver for secure state estimation. In Proc. Int. Workshop on Satisfiability Modulo Theories.Google Scholar
- Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L Sangiovanni- Vincentelli, Sanjit A Seshia, and Paulo Tabuada. 2017. Secure state estimation for cyber-physical systems under sensor attacks: A satisfiability modulo theory approach. IEEE Trans. Automat. Control 62, 10 (2017), 4917--4932.Google ScholarCross Ref
- Jacob Steinhardt, Moses Charikar, and Gregory Valiant. 2017. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. arXiv:cs.LG/1703.04940Google Scholar
- Lili Su and Shahin Shahrampour. 2018. Finite-time Guarantees for Byzantine-Resilient Distributed State Estimation with Noisy Measurements. arXiv:cs.SY/1810.10086Google Scholar
- Lili Su and Nitin H. Vaidya. 2016. Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms (PODC '16). Association for Computing Machinery, New York, NY, USA, 425--434. https://doi.org/10.1145/2933057.2933105Google Scholar
- Lili Su and Nitin H. Vaidya. 2016. Non-Bayesian Learning in the Presence of Byzantine Agents. In Distributed Computing. Springer Berlin Heidelberg, Berlin, Heidelberg, 414--427.Google Scholar
- Lili Su and Nitin H. Vaidya. 2021. Byzantine-Resilient Multiagent Optimization. IEEE Trans. Automat. Control 66, 5 (2021), 2227--2233.Google ScholarCross Ref
- Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. 2018. Generalized Byzantinetolerant SGD. arXiv:cs.DC/1802.10116Google Scholar
- Zhixiong Yang and Waheed U. Bajwa. 2017. ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning. arXiv:cs.LG/1708.08155Google Scholar
- Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. 2018. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 5650--5659. http://proceedings.mlr.press/v80/yin18a.htmlGoogle Scholar
Index Terms
- Approximate Byzantine Fault-Tolerance in Distributed Optimization
Recommendations
Fault-Tolerance in Distributed Optimization: The Case of Redundancy
PODC '20: Proceedings of the 39th Symposium on Principles of Distributed ComputingThis paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function. The goal of a distributed optimization algorithm is to allow the agents to collectively compute ...
Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms
PODC '16: Proceedings of the 2016 ACM Symposium on Principles of Distributed ComputingThis paper addresses the problem of distributed multi-agent optimization in which each agent i has a local cost function hi(x), and the goal is to optimize a global cost function that aggregates the local cost functions. Such optimization problems are ...
Application-Aware Byzantine Fault Tolerance
DASC '14: Proceedings of the 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure ComputingByzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing ...
Comments